Together, the autocorrelations at lags 1, 2, , make up the or ACF.
The plot is known as a correlogram
Autocorrelation
new_production %>%ACF(Beer) %>%autoplot()
r_{4} higher than for the other lags. This is due to the seasonal pattern in the data: the peaks tend to be 4 quarters apart and the troughs tend to be 2 quarters apart.
r_2 is more negative than for the other lags because troughs tend to be 2 quarters behind peaks.
Trend and seasonality in ACF plots
When data have a trend, the autocorrelations for small lags tend to be large and positive.
When data are seasonal, the autocorrelations will be larger at the seasonal lags (i.e., at multiples of the seasonal frequency)
When data are trended and seasonal, you see a combination of these effects.
set.seed(30)wn <-tsibble(t =1:50, y =rnorm(50), index = t)wn %>%autoplot(y)
White noise data is uncorrelated across time with zero mean and constant variance.
(Technically, we require independence as well.)
Example: White noise
wn %>%ACF(y)
r_{1}
r_{2}
r_{3}
r_{4}
r_{5}
r_{6}
r_{7}
r_{8}
r_{9}
r_{10}
0.014
-0.163
0.163
-0.259
-0.198
0.064
-0.139
-0.032
0.199
-0.024
Sample autocorrelations for white noise series.
Expect each autocorrelation to be close to zero.
Blue lines show 95% critical values.
Sampling distribution of autocorrelations
Sampling distribution of r_k for white noise data is asymptotically N(0,1/T).
95% of all r_k for white noise must lie within \pm 1.96/\sqrt{T}.
If this is not the case, the series is probably not WN.
Common to plot lines at \pm 1.96/\sqrt{T} when plotting ACF. These are the critical values.
Example: Pigs slaughtered
pigs <- aus_livestock %>%filter(State =="Victoria", Animal =="Pigs", year(Month) >=2014)pigs %>%autoplot(Count/1000) +labs(y ="Thousands", title ="Number of pigs slaughtered in Victoria")
Example: Pigs slaughtered
pigs %>%ACF(Count) %>%autoplot()
Example: Pigs slaughtered
Monthly total number of pigs slaughtered in the state of Victoria, Australia, from January 2014 through December 2018 (Source: Australian Bureau of Statistics.)
Difficult to detect pattern in time plot.
ACF shows significant autocorrelation for lag 2 and 12.
Indicate some slight seasonality.
These show the series is not a white noise series.
Let’s Try One
You can compute the daily changes in the Google stock price in 2018 using
Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?
global_economy <- tsibbledata::global_economyprint_retail <- aus_retail %>%filter(Industry =="Newspaper and book retailing") %>%group_by(Industry) %>%index_by(Year =year(Month)) %>%summarise(Turnover =sum(Turnover))aus_economy <-filter(global_economy, Code =="AUS")print_retail %>%left_join(aus_economy, by ="Year") %>%mutate(Adj_turnover = Turnover/CPI) %>%pivot_longer(c(Turnover, Adj_turnover), names_to ="Type", values_to ="Turnover") %>%ggplot(aes(x = Year, y = Turnover)) +geom_line() +facet_grid(vars(Type),scales ="free_y") +xlab("Years") +ylab(NULL) +ggtitle("Turnover: Australian print media industry") + hrbrthemes::theme_ipsum_rc()
Mathematical transformations
If the data show different variation at different levels of the series, then a transformation can be useful.
Denote original observations as y_1,\dots,y_n and transformed observations as w_1, \dots, w_n.
Transformations
Square root
w_t = \sqrt{y_t}
Cube root
w_t = \sqrt[3]{y_t}
Logarithm
w_t = \log(y_t)
Logarithms, in particular, are useful because they are more interpretable: changes in a log value are relative (percent) changes on the original scale.
Each of these transformations is close to a member of the family of Box-Cox transformations: w_t = \left\{\begin{array}{ll}
\log(y_t), & \quad \lambda = 0; \\
(y_t^\lambda-1)/\lambda , & \quad \lambda \ne 0.
\end{array}\right.
\lambda=1: (No substantive transformation)
\lambda=\frac12: (Square root plus linear transformation)
Simple transformations are easier to explain and work well enough.
Transformations can have very large effect on PI.
If the data contains zeros, then don’t take logs.
logp1() can be useful for data with zeros.
If some data are negative, no power transformation is possible unless a constant is added to all values.
Choosing logs is a simple way to force forecasts to be positive
Transformations must be reversed to obtain forecasts on the original scale. (Handled automatically by fable.)
Try this out…
For the following series, find an appropriate transformation in order to stabilise the variance.
United States GDP from global_economy
Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock
Victorian Electricity Demand from vic_elec.
Gas production from aus_production
Why is a Box-Cox transformation unhelpful for the canadian_gas data?
Time series components
Time series patterns
Recall
Trend pattern exists when there is a long-term increase or decrease in the data.
Cyclic pattern exists when data exhibit rises and falls that are not of fixed period (duration usually of at least 2 years).
Seasonal pattern exists when a series is influenced by seasonal factors (e.g., the quarter of the year, the month, or day of the week).
A Note on DeSeasoning
us_retail_employment <- us_employment %>%filter(year(Month) >=1990, Title =="Retail Trade") %>%select(-Series_ID)dcmp <- us_retail_employment %>%model(STL(Employed))autoplot(us_retail_employment, Employed, color ="gray") +autolayer(components(dcmp), season_adjust, color ="blue") +labs(y ="Persons (thousands)", title ="Total employment in US retail")
Moving Averages
The general idea is a moving window. We will set .before and .after as follows.
# A tsibble: 58 x 10 [1Y]
# Key: Country [1]
Country Code Year GDP Growth CPI Imports Exports Popul…¹ `5-MA`
<fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Austral… AUS 1960 1.86e10 NA 7.96 14.1 13.0 1.03e7 NA
2 Austral… AUS 1961 1.96e10 2.49 8.14 15.0 12.4 1.05e7 NA
3 Austral… AUS 1962 1.99e10 1.30 8.12 12.6 13.9 1.07e7 13.5
4 Austral… AUS 1963 2.15e10 6.21 8.17 13.8 13.0 1.10e7 13.5
5 Austral… AUS 1964 2.38e10 6.98 8.40 13.8 14.9 1.12e7 13.6
6 Austral… AUS 1965 2.59e10 5.98 8.69 15.3 13.2 1.14e7 13.4
7 Austral… AUS 1966 2.73e10 2.38 8.98 15.1 12.9 1.17e7 13.3
8 Austral… AUS 1967 3.04e10 6.30 9.29 13.9 12.9 1.18e7 12.7
9 Austral… AUS 1968 3.27e10 5.10 9.52 14.5 12.3 1.20e7 12.6
10 Austral… AUS 1969 3.66e10 7.04 9.83 13.3 12.0 1.23e7 12.6
# … with 48 more rows, and abbreviated variable name ¹Population
autoplot(aus_exports, Exports) +autolayer(aus_exports, `5-MA`, color ="red") +labs(y ="Exports (% of GDP)", title ="Total Australian exports") +guides(colour =guide_legend(title ="series")) + hrbrthemes::theme_ipsum_rc()
We can even have moving averages of moving averages.
# A tsibble: 440 x 4 [1M]
Time Month Year Employed
<mth> <ord> <dbl> <dbl>
1 1978 Feb Feb 1978 5986.
2 1978 Mar Mar 1978 6041.
3 1978 Apr Apr 1978 6054.
4 1978 May May 1978 6038.
5 1978 Jun Jun 1978 6031.
6 1978 Jul Jul 1978 6036.
7 1978 Aug Aug 1978 6005.
8 1978 Sep Sep 1978 6024.
9 1978 Oct Oct 1978 6046.
10 1978 Nov Nov 1978 6034.
# … with 430 more rows
August 2014 employment numbers higher than expected.
Supplementary survey usually conducted in August for employed people.
Most likely, some employed people were claiming to be unemployed in August to avoid supplementary questions.
Supplementary survey not run in 2014, so no motivation to lie about employment.
In previous years, seasonal adjustment fixed the problem.
The ABS has now adopted a new method to avoid the bias.
Some Data for Today and General Considerations
Panel data. Multiple time series are often described as a panel, a cross-section of time series, or a time series of cross-sections. The data structure has two [non-overlapping] indices. Let’s review, and discuss a bit, what exactly we mean.
For much of the study of time series, the key issue is one known as stationarity. For now, we will do at least some hand waving to be clarified in chapters 5 and more in 9. But we want to compute things and then build out all the details. Let’s take my new retail employment data.
library(tidyquant)Ford <-tq_get("F", from ="2000-01-01")FordT <- Ford %>%as_tsibble(index = date)FordT %>%autoplot(adjusted)
FC <- Ford %>%tq_transmute(adjusted, mutate_fun = periodReturn, period ="monthly") %>%mutate(YM =yearmonth(date)) %>%as_tsibble(., index = YM)FC %>%autoplot(monthly.returns)
Ford’s ACF
The 6/7 and 12/13 patterns are interesting….
library(patchwork)FC1 <- FC %>%ACF(monthly.returns) %>%autoplot()FC2 <- FC %>%PACF(monthly.returns) %>%autoplot()FC1 + FC2
Education and Health Services: Health Care and Social Assistance
0.999
0.313
0
4
6.19e+04
63326
17.6
0.517
0.584
Financial Activities
1.000
0.870
7
4
6.92e-01
78437
-273.1
0.723
0.879
Goods-Producing
0.996
0.812
9
2
1.12e+04
43343
-64790.2
0.734
1.081
Government
1.000
0.981
11
7
3.94e+02
190941
-19815.3
0.599
0.538
Government: Local Government
1.000
0.986
5
7
3.14e+02
96008
-15450.2
0.640
0.692
Government: Local Government Education
1.000
0.996
3
7
3.54e+01
54936
-8479.9
0.538
0.480
Leisure and Hospitality
0.997
0.607
7
4
5.47e+05
136395
22427.3
0.522
0.604
Leisure and Hospitality: Accommodation and Food Services
0.971
0.473
8
4
5.27e+06
31615
-1157.7
0.499
0.541
Leisure and Hospitality: Food Services and Drinking Places
0.975
0.404
6
4
3.26e+06
29889
-377.9
0.476
0.497
Manufacturing
0.997
0.434
9
2
5.19e+03
-8507
-64155.3
0.772
1.262
Private Service-Providing
1.000
0.543
0
4
1.50e+07
906495
115965.6
0.545
0.623
Professional and Business Services
1.000
0.675
11
1
3.62e+03
187091
41544.7
0.626
0.810
Professional and Business Services: Administrative and Support Services
0.995
0.808
11
1
1.84e+04
21351
-8027.1
0.617
0.797
Professional and Business Services: Administrative and Waste Services
0.995
0.810
11
1
1.93e+04
22521
-8025.8
0.617
0.804
Professional and Business Services: Professional and Technical Services
1.000
0.691
2
5
1.81e+02
29125
-1380.0
0.626
0.719
Retail Trade
1.000
0.881
0
4
5.78e+03
135654
-6877.1
0.511
0.473
Trade, Transportation, and Utilities
1.000
0.845
0
4
2.02e+04
211628
-7668.1
0.566
0.583
coef_hurst
A measure of the degree to which adjacent observations depend on one another over time. Generically, this statistic takes values between zero and one with one indicating very high levels of dependence through time.
FC %>%features(monthly.returns, features = feat_spectral)
# A tibble: 1 × 1
spectral_entropy
<dbl>
1 0.988
The Absence of Correlation
Ljung-Box modifies the idea in the Box-Pierce statistic for assessing whether or not a given series [or transformation thereof] is essentially uncorrelated. In both cases, we will get to the details next week [chapter 5]. For now, the idea is simply that k squared autocorrelations will sum to a chi-squared distribution with k degrees of freedom. Large correlations reveal dependence.
USET %>%features(Employed, features =list(box_pierce, ljung_box))
The stationarity issue from earlier is given much attention. Can we reasonably think of characteristics as fixed? There are three means of assessment with details to Chapter 9.
USET %>%features(Employed, features =list(unitroot_kpss, unitroot_pp, unitroot_ndiffs, unitroot_nsdiffs)) %>% knitr::kable(format ="html")
Title
kpss_stat
kpss_pvalue
pp_stat
pp_pvalue
ndiffs
nsdiffs
Financial Activities
4.63
0.01
-1.193
0.100
1
1
Manufacturing
5.68
0.01
-0.938
0.100
1
0
Retail Trade
3.91
0.01
-2.636
0.089
1
1
FC %>%features(monthly.returns, features =list(unitroot_kpss, unitroot_pp, unitroot_ndiffs, unitroot_nsdiffs))
USET %>%features(Employed, features =list(shift_level_max, shift_var_max, shift_kl_max)) %>%kable(format ="html")
Title
shift_level_max
shift_level_index
shift_var_max
shift_var_index
shift_kl_max
shift_kl_index
Financial Activities
371
229
24037
233
0.299
227
Manufacturing
1559
228
417020
235
0.522
227
Retail Trade
777
226
788931
354
1.841
227
FC %>%features(monthly.returns, features =list(shift_level_max, shift_var_max, shift_kl_max)) %>%kable(format ="html")
shift_level_max
shift_level_index
shift_var_max
shift_var_index
shift_kl_max
shift_kl_index
0.258
110
0.194
113
36.8
112
Crossings and Flat Spots
USET %>%features(Employed, features =list(n_crossing_points, longest_flat_spot)) %>%kable(format ="html")
Title
n_crossing_points
longest_flat_spot
Financial Activities
5
40
Manufacturing
11
52
Retail Trade
31
10
FC %>%features(monthly.returns, features =list(n_crossing_points, longest_flat_spot)) %>%kable(format ="html")
n_crossing_points
longest_flat_spot
129
8
ARCH
What proportion of the current squared residual is explained by the prior squared residual? This reports R^2; if the variance explained is large, volatility is persistent. There is a chi-square statistic also.
USET %>%features(Employed, features = stat_arch_lm) %>%kable(format ="html")
Title
stat_arch_lm
Financial Activities
0.989
Manufacturing
0.972
Retail Trade
0.917
FC %>%features(monthly.returns, features = stat_arch_lm) %>%kable(format ="html")
stat_arch_lm
0.05
The Box-Cox
USET %>%features(Employed, features = guerrero) %>%kable(format ="html")
Title
lambda_guerrero
Financial Activities
0.948
Manufacturing
1.037
Retail Trade
1.186
FC %>%features(monthly.returns, features = guerrero) %>%kable(format ="html")